P. Alexander Burnham

The following is a report for my analysis of the 2017 USDA NASS dataset and USDA ARMS survey. The goal is to explore the distribution of new and beginning farmers and their income and wealth statistics.

Question 1:

Use the 2017 USDA NASS dataset to examine new and beginning farmer (less than 11 years of experience) principal operators, as well as other relevant NASS data to answer ONE of the following questions: In which states are the greatest proportion of farmers considered new and beginning?

Load In Data from NASS Dataset
Here I selected the number of principle producers from each state with less than and greater than or equal to 11 years of experience. The sum of the two being the total number of principle producers in each state. I am using an an interface in R to access the NASS API using an API key provided by usda.gov. This ensures the code operates without a separate data repository and ensures the data are up to date when accessed. As the files are 5gb this reduces my overhead and simplifies reproducibility for other potential collaborators.

# get principle producers with less than 11 years experience -state level
greatThan_11 <- nass_data(year = 2017,
          short_desc = "PRODUCERS, PRINCIPAL, YEARS ON ANY OPERATION, LT 11 YEARS - NUMBER OF PRODUCERS",
          agg_level_desc = "STATE")

# get principle producers with greater than or equal to 11 years experience -state level
lessThan_11 <- nass_data(year = 2017,
          short_desc = "PRODUCERS, PRINCIPAL, YEARS ON ANY OPERATION, GE 11 YEARS - NUMBER OF PRODUCERS",
          agg_level_desc = "STATE")

Merge datasets, select required columns
Here I do the merge operation to combine the data and simplify the dataset for my purposes.

# change value to greater11
greatThan_11 <- greatThan_11 %>% 
       rename("greater11" = "Value")

# change value to less11
lessThan_11 <- lessThan_11 %>% 
       rename("less11" = "Value")

# merge datasets by state alpha
experienceMerged <- merge(greatThan_11, lessThan_11, by = "state_alpha")

# select state alpha, greater11, and less11
experienceClean <- dplyr::select(experienceMerged, less11, greater11, state_alpha, state_fips_code.x)

Create percentage variable
Here I ensure the variables of interest are numeric and create the percentage variable required.

# remove commas and convert to numeric vectors
experienceClean$less11 <- as.numeric(gsub(",","",experienceClean$less11))
experienceClean$greater11 <- as.numeric(gsub(",","",experienceClean$greater11))
experienceClean$fips <- as.numeric(experienceClean$state_fips_code.x)

# calculate the percentage here
experienceClean$percentNew <- (experienceClean$less11/(experienceClean$less11 + experienceClean$greater11)) * 100

Let’s make a choropleth to explore the spatial distribution of this variable.
Here I implement a plotly choropleth. It has some interactivity built into it. The ability to hover over states to see the state code and value is included as well as basic zooming and saving features. I find it to be a good exploratory plotting tool.

Percent of New and Beginning Principle Farm Operators in 2017

Let’s plot a ranked bar plot with an average threshold line
While the spatial plot is very helpful for looking for potential geographic patterns, it makes looking for max and min values more difficult at a glance. Here I calculated the national proportion of new primary producers to to all producers. I use a proportion test to calculate the confidence interval and find three sigma to find the interval between which 99.7 of the data will be found in a normal distribution.

# find average percentage nation wide
new <- sum(experienceClean$less11)
total <- (sum(experienceClean$less11) + sum(experienceClean$greater11)) 

# find average
avg <- (new / total)*100

# find confidence interval to calculate 3 sigma
prop <- prop.test(new, total, correct=FALSE)
threeSig <- (((prop$conf.int[2] - prop$conf.int[1])/4)*3)*100

# create a variable for our three sigma calculation
experienceClean$threeSigCat <- ifelse(experienceClean$percentNew>avg+threeSig, "Above", ifelse(experienceClean$percentNew<avg-threeSig, "Below", "Average"))

Ranked Percent New and Beginning Primary Farm Operators for 2017

Is there a difference between the state’s proportions?
With a significant p-value (p < 0.0001) I reject the null hypothesis and there appears to be evidence that the states differ in the proportion of new and beginning farmers.

## 
##  50-sample test for equality of proportions without continuity
##  correction
## 
## data:  experienceClean$less11 out of (experienceClean$less11 + experienceClean$greater11)
## X-squared = 15825, df = 49, p-value < 2.2e-16
## alternative hypothesis: two.sided
## sample estimates:
##    prop 1    prop 2    prop 3    prop 4    prop 5    prop 6    prop 7    prop 8 
## 0.5676856 0.7163749 0.7286361 0.7947788 0.7414086 0.7149393 0.7191781 0.8161249 
##    prop 9   prop 10   prop 11   prop 12   prop 13   prop 14   prop 15   prop 16 
## 0.7036227 0.6845286 0.7041925 0.8036842 0.7232812 0.7891885 0.7767058 0.7793540 
##   prop 17   prop 18   prop 19   prop 20   prop 21   prop 22   prop 23   prop 24 
## 0.7352635 0.7201429 0.7502462 0.7596422 0.6946287 0.7686988 0.8108141 0.7619867 
##   prop 25   prop 26   prop 27   prop 28   prop 29   prop 30   prop 31   prop 32 
## 0.7387001 0.7892362 0.7479764 0.8123275 0.8006048 0.7174619 0.7757881 0.7635868 
##   prop 33   prop 34   prop 35   prop 36   prop 37   prop 38   prop 39   prop 40 
## 0.7381697 0.7603600 0.7631999 0.7207460 0.7385124 0.7761268 0.7322558 0.7310627 
##   prop 41   prop 42   prop 43   prop 44   prop 45   prop 46   prop 47   prop 48 
## 0.8132155 0.7493078 0.7272972 0.7345957 0.7548948 0.7194515 0.7522374 0.8048102 
##   prop 49   prop 50 
## 0.7108951 0.7363196

Summary

The top five states with the highest percentage of new farmers in descending rank order were Delaware, South Dakota, North Dakota, Minnesota, and Wisconsin. The national percentage was 75.37% new and beginning farmers. In general, it seemed as though the highest influx of new farmers has been in the northern Midwest. As one of the largest farming centers in the country, this makes some sense. All states where either significantly higher or lower (3-sigma = 0.077%) than the national estimate. 21 states were below the 3 sigma margin and the remaining 29 were above. This may indicate that farming, varies culturally, environmentally, legislatively etc. so much from state to state that the national average is actually a poor population parameter estimate to compare to individual states. A proportion test indicated that there was a highly significant difference between the proportions of new and beginning farmers among states (p < 0.0001) In future and with more time, more complicated statistical models should be developed with other meaningful social, political and geographic parameters at the state level in order to examine further patterns in this variable.

Question 2:

Using data from the USDA ARMS survey on farm income and wealth statistics, assess the relationship between new and beginning farmers and federal government direct farm payments. To what extent is there a relationship between new and beginning farmer populations and federal government payments? Where are these payments most common?

Get Data from the ARMS API
Here I use a REST POST request to grab the data from the USDA ARMS API as a json file. I selected the three most recent years where direct payments were still being disbursed (ended in 2014). These are not perfectly in line with our 2017 new and beginning data but taking time lags and the speed of farmer turnover into account, some interesting patterns may still be observable. I requested state-level direct payment data and included economic class and production specialty as covariates.


# my API endpoint
https://api.ers.usda.gov/data/arms/surveydata?api_key=My_Key

# my POST request:
{
    "year": [2011,2012,2013],
    "state": ["Alabama", "Alaska", "American Samoa", "Arizona", "Arkansas", "California", "Colorado", "Connecticut", "Delaware", "District of Columbia", "Florida", "Georgia", "Guam", "Hawaii", "Idaho", "Illinois", "Indiana", "Iowa", "Kansas", "Kentucky", "Louisiana", "Maine", "Maryland", "Massachusetts", "Michigan", "Minnesota", "Minor Outlying Islands", "Mississippi", "Missouri", "Montana", "Nebraska", "Nevada", "New Hampshire", "New Jersey", "New Mexico", "New York", "North Carolina", "North Dakota", "Northern Mariana Islands", "Ohio", "Oklahoma", "Oregon", "Pennsylvania", "Puerto Rico", "Rhode Island", "South Carolina", "South Dakota", "Tennessee", "Texas", "U.S. Virgin Islands", "Utah", "Vermont", "Virginia", "Washington", "West Virginia", "Wisconsin", "Wyoming"],
    "report": "government payments",
    "category": ["economic class", "production specialty"],
    "variable": "direct payments"

}

Read in Data and Select Relevant Columns
Here I read in the data requested and select the columns for the analysis. The estimate variable is in 1000s of dollars so I create a new column taking this scaling into account.

# Loading packages
library(jsonlite)

# read in data as a data frame
arms <- jsonlite::fromJSON("ARMS_direct_payment.json")$data

# select data
armsClean <- dplyr::select(arms, year, state, category, category_value, estimate)

# estimate in 1000s of dollars
armsClean$payments <- armsClean$estimate * 1000


# Split data by category
split <- split(armsClean, armsClean$category)
specialty <- dplyr::select(split$`Production Specialty`, year, state, payments, category_value)
class <- dplyr::select(split$`Economic Class`, year, state, payments, category_value)

Lets plot payments by year for both categories
I summarize both datasets by year, state, and category_value using dplyr group_by.

## # A tibble: 6 × 6
## # Groups:   year [2]
##    year category_value            mean        sd     n        se
##   <int> <chr>                    <dbl>     <dbl> <int>     <dbl>
## 1  2011 $1,000,000 or more   50601333. 34663388.    15  8950048.
## 2  2011 $100,000 to $249,999 18243400  16331102.    15  4216672.
## 3  2011 $250,000 to $499,999 29732600  23543001.    15  6078777.
## 4  2011 $500,000 to $999,999 44638800  33774248.    15  8720473.
## 5  2011 Less than $100,000   14070600  11217099.    15  2896243.
## 6  2012 $1,000,000 or more   70089667. 44077898.    15 11380864.

Average Direct Payment Amount Over Time by Farm Economic Class

Average Direct Payment Amount Over Time by Farm Primary Product

Let’s run a few quick statistical models to examine the patterns observed in those two figures above. The distribution is heavily skewed right for both datasets. I will model the distributions using a gamma distribution to account for this.

# first with economic class
modClass <- glm(data = class, payments~year * category_value, family = "Gamma")
Anova(modClass, test.statistic="LR") # likelihood ratio test
## Analysis of Deviance Table (Type II tests)
## 
## Response: payments
##                     LR Chisq Df Pr(>Chisq)    
## year                   0.081  1     0.7754    
## category_value       102.114  4     <2e-16 ***
## year:category_value    3.875  4     0.4232    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# remove 0s for gamma dist.
specNo0 <- specialty[specialty$payments>0,]

# next producer specialty
modProd <- glm(data = specNo0, payments~year * category_value, family = "Gamma")
Anova(modProd, test.statistic="LR") # likelihood ratio test
## Analysis of Deviance Table (Type II tests)
## 
## Response: payments
##                     LR Chisq Df Pr(>Chisq)    
## year                    0.10  1      0.754    
## category_value        367.32 11     <2e-16 ***
## year:category_value     5.31 11      0.915    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Merge the datasets from questions 1 and 2
I will examine a state’s proportion of new farmers to the amount of money in direct payments received in 2013 (the most recent year it was issued). I will use the economic class dataset in this merge with economic class as a potential covariate of interest.

# add state name to experience dataset
experienceClean$state <- state.name[match(experienceClean$state_alpha,state.abb)] 

# select experience vars for merge
expSimple <- dplyr::select(experienceClean, percentNew, state)

# do the merge for all x
expClass <- merge(x=class, y=expSimple, all.x=T, all.y = F, by = "state")

Direct Payments by Percent Primary Producers as a function of Economic Class

Let’s Examine the Significance of this Apparent Trend

# Experience model
modExp <- glm(data = expClass[expClass$year==2013,], payments~percentNew * category_value, family = "Gamma")
Anova(modExp, test.statistic="LR") # likelihood ratio test
## Analysis of Deviance Table (Type II tests)
## 
## Response: payments
##                           LR Chisq Df Pr(>Chisq)    
## percentNew                  21.734  1  3.133e-06 ***
## category_value              46.025  4  2.434e-09 ***
## percentNew:category_value   23.198  4  0.0001156 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Let’s pull the count data for direct payments at the state level
Which states had the highest number of payments in 2013?

## # A tibble: 15 × 2
##    state            sum
##    <chr>          <dbl>
##  1 Iowa           44279
##  2 Illinois       34246
##  3 Minnesota      32844
##  4 Wisconsin      23590
##  5 Nebraska       23149
##  6 Kansas         19587
##  7 Indiana        19026
##  8 Missouri       18456
##  9 Texas          12449
## 10 Georgia         5387
## 11 North Carolina  5004
## 12 Arkansas        4161
## 13 Washington      2435
## 14 California      2369
## 15 Florida          562

Summary

Only 15 states in the last 3 years of the program received funding. Arkansas, California, Florida, Georgia, Illinois, Indiana, Iowa, Kansas, Minnesota, Missouri, Nebraska, North Carolina, Texas, Washington, and Wisconsin. There were significant effects of economic class and production specialty on government direct payments (p < 0.0001), however, for the years I selected (2011-2013) there was no effect of year detected. The interaction effects of year on economic class and production specialty were also insignificant with this data selection. On average, farms with a higher economic class value (potentially larger commercial farms) and corn producers received the highest payments overall. To determine if these were significant, a corrected pairwise comparisons post-hoc test should be conducted. On average, states with higher percent new and beginning farmers in 2017 received higher direct payments in 2013 (x^2 = 21.7, p < 0.0001). There was also a significant difference in average payments between economic classes (x^2 = 46.0, p < 0.0001). A significant interaction between economic class and percent new primary producers indicates that the positive linear trends between these variables differ based on the economic class they are associated with (x^2 = 23.2, p = 0.0001). The states with the highest frequency of direct payments were Iowa, Illinois, Minnesota, Wisconsin, and Nebraska in decreasing rank order. The highest dollar amounts in terms of direct payments were also centered around Midwestern states like Illinois, Nebraska, Iowa. These were correlated with the highest corn producing states and the areas of highest percent new farmers as indicated by our choropleth. These significant effects of percent new and beginning farmers on direct payouts are likely correlated spatially with significant areas of farming. Further work would need to be done to attempt to parse the true nature of this relationship.